Christina's LIS Rant: 06/01/2008

Christina's LIS Rant

Monday, June 30, 2008

SLA2008: Physics roundtable

I think this should be it.

APS - view plus - for the visually impaired. Work with publishers to develop a cost effective way to make their content available to the visually impaired
(I said I would link to handouts - but the SLA site is down again)

Pat Viele introduced Bruce Mason who is working on COMPADRE a very cool resource for physics and astro educational resources. Things are cataloged so that you can find them - neat simulations and games and resource sites at all levels. Adopt a physicist - for your classroom so that you can send him or her questions.

Michael Fosmire talked about their institutional repository (and what is slick about this is that the department does a lot of the work - they were doing it before, too, but it's not on the shoulders of the scientists)
- opt out policy
admin adds to endnote > appropriate copy is identified (looking at Sherpa/Romeo, etc) by library assistant > physics librarian > export to xml > digital commons batch load > record even if no full text

some little details about Physics Today - you can ask some attendees what the details were

big budget cuts in the SUNY system... I'm sure other state universities are in the same boat.

Labels: SLA2008

¶ 6:11 PM| (0) comments |cites (technorati) |

SLA2008: Astro Roundtable

Kerry Kroffe from IOP presented on the transition of AAS journals to IOP. He's headquartered in DC.
They ported over and are cleaning up older articles (5k articles, 122 issues). The peer review system had to be customized to reflect the different way that AAS does peer review (from other IOP hosted journals - more collaborative effort with more revisions to get the best paper possible). The articles were moved over and then the access control later - with very few problems. Copy editing is being done in India (!). Acceptance to web posting time decreased 50% and they're looking to get it down to 1 month.
New features
- electronic page charges
- VO tables (sort of xml vice vizer which was machine readable ascii)
- object and dataset linking, link out to source data (now that's cool)

to come
- hosting the data behind the pictures
- data cube - but figure out the preservation -- anything that goes in the journal has to be preserved
- upswing in planetary papers (I've noticed that)

ApJ - 30k articles - 9/1 ApJ/ApJ supplements submissions
10/15 letters submissions
12 archives to IOP

Chris B. AAS Update
He's the in house journals manager for AAS.
-figure tables
-regional digital printing (lower carbon footprint - plus it just really makes sense)
- eventually - no print, print whole volume at a time, or get print each week - we'll be able to select. Users will be able to get print bundles of their articles or collections of interesting articles

Kerry? from International Planetarium Society about outreach

Donna Thompson from ADS
Crossref grant proposal (cool! more dois)
better coverage of related sciences
3.7-4.5M records in physics...
More AGU, T&F, Nature, Science (yay!)
Loaded A&A abstracts
better scans
education search - reviews and popular articles (i think she said)
new
- recommendation
- personalization
- vizualization
- citation analysis
- make a list with annotations

IYA 2009
LISAVI (in India)

Labels: SLA2008

¶ 5:55 PM| (0) comments |cites (technorati) |

SLA2008: Social Tools and the Enterprise

In this session Liz Lawley flew through a bunch of cool tools - which I've mostly seen before, but it's worth a reminder.
Her talk was focused on: productivity, presence, privacy, portability, play
Her bookmarks are at: http://del.icio.us/mamamusings/sla08 (if people hadn't been mobbing her I would have mentioned that the standard tag is sla2008, but oh well)

First - she used Flickr with cc license to replace stock photos.

presence - im integrates with calendar (check - we do that at MPOW), facebook newsfeeds, twitter for close friends, IRC backchannels (seems very old school), ambient displays
privacy - selective sharing - not binary off/on, social around something, not just social for social sake
portability - grand central beta - Google tool to play different voicemail message depending who called (I'm not 100% sure, but I think I can do something like this with my Cisco VOIP phone at work - it is integrated with my e-mail, which took some getting used to but I now like), goog411, jott.com (who was I trying to tell about this - I forget)- i should use this for twitter since i don't pay for the text messing package on my phone (yes, I am *so* old school)
productive play - attent, passive multiplayer online gaming. She has some game so that students go on "missions" and get points for visiting the sites on her assignment sheets.

Labels: SLA2008

¶ 5:10 PM| (0) comments |cites (technorati) |

SLA2008: cyber infrastructure - building bridges

This was also on Monday at 1:30.
Lucy Nowell, Program Director, NSF Office of Cyberintrastructure presented on what her office does and what's needed -- in particular, she wanted to encourage librarians to be part of project teams who are proposing for this.
infrastructure is like power, water, transportation
research infrastructure is labs, observatories, libraries, computational infrastructure, professional societies
library infrastructure is catalogs, journals, citation indexes
computational infrastructure - internet, local networks, computers, scanners, high performance computing, virtual organizations, data

so cyberinfrastructure
virtual orgs for distributed communities
high performance computing data/visualization/interaction
training & workforce needs/opportunities

cyberinfrastructure can be top down but can also be science driven
IPY 1 data is available (all in paper) but IPY2 data is not available for use - it was digital
IPY 3 is going on right now, will we preserve and provide access to that data?

Why preserve data?
- irreversible (like earthquakes)
- for replication
- longitudinal analysis
- interdisciplinary research
- to broaden participation (citizen science but also with partners from less developed countries)

DataNet
- user centered, evolveable, multisector, sustainable, open, nimble, extensible
- longterm preservation and access to data
- technologically and economically sustainable
- empower science
- partners *will* include libraries, computer scientists, domain experts
i love this quote

" we don't want to protect data from users"

- submissions will be interdisciplinary and will deal with data through its lifecycle - will be foundational for other programs to build on

Labels: cyberinfrastructure, SLA2008

¶ 4:39 PM| (0) comments |cites (technorati) |

SLA2008: KM at the Core, Facilitating Knowledge Sharing

This session was Monday at 9am and featured Dave Snowden - who is entertaining to listen to, even if he trashes my whole paradigm of research (harumph, if he can't tell anecdotal "evidence" from rigorous systematic qualitative research - that's because he doesn't know better, not because there isn't a difference!)

(podcast is available, and slides - apparently the same as from Limerick)
My notes were on paper - and this is 2 weeks later so...

KM (knowledge management) - is a theory or Weltanschauung with dysfunctional technology
Social Computing (social computing technologies - SCTs) - increasingly functional technology without theory or Weltanschauung

His goal is to build theory to use SCTs for KM (if there *can* be such a thing as KM) - given knowledge is volunteered, not conscripted. Lack of sharing: it's often not a matter of "knowledge is power" but the fear of abuse - so sharing happens with people who are trustworthy. By forcing knowledge sharing you get unusable stuff and you have to go back to the author.

Knowledge is contextual - you remember when triggered by a weak signal. The way people know something in the field is different from the way they discribe something. Standard deal: know more than we can tell, can tell more than we can write down.
3 groups of knowledge:
know by doing - muscle memory, only learn by experience, successfully transferred through apprenticeship (he doesn't like the term tacit, though)
stuff we can tell - communicated by story, pre-rehersal (so going over what you're going to do in a meeting before you do it), activate patterns (?)
stuff we can write down - severely limited because highly structured so "best practice" documents are expensive.

SCTs are working to support stuff we can tell, but there's no theory.
He likes Dervin's sensemaking (don't we all?), and using a complex systems approach - make sense of the world so we can act on it.
cogsci - the two predominant models - information processing and behavioralism are not supported by newer cognitive science research; therefore, the fundamentals of business research are misguided (at best)(!)

In an ordered system
- repeated relationships between cause and effect
- manufacturing > most km
- constraints on behavior - no degrees of freedom
- no innovation - but apply these methods in an unordered system and you'll have failure

Chaotic systems
- unconstrained - use statistics and probability theory

Complex adaptive systems
- system and agents co-evolve
- hindsight does not lead to foresight
- future is inherently uncertain
- very sensitive to starting conditions
- system level effects can be emergent

Flexible, negotiable boundaries
- use attractors to encourage good behavior and disruptors to discourage bad
- weak signal detection (to disrupt early) - good surveillance

initiate a system - with safe-fail experimentation - distributed cognition vice centralized cognition which has low resilience.

the Cynfin framework (no doubt TM)
"complex acts of knowing"
(lots of really illegible notes here - gee my handwriting is terrible)
probe - sense -respond, don't allow existing experts to dictate decisions based on best practices from historical data. "bounded applicability" need to increase dissent...

Narrative picks up more signals than analytical analysis (huh?). Don't confuse innovative with creative (yes, well, Jill and I also made that point)

Internal km systems - based on ordered system, formal proccesses with large chunks
wisdom of the crowds - distributed cognition - everybody must make a decision independent of everyone else - this is *not* a prediction market where you can see what decisions everyone else is making.
Cognitive differences help distributed cognition. Keep partial patterns - everything is fragmented - increase fragmentation - chunking and summarizing is too slow and loses important detail...

km has to be a request system, but you have to tell people what you know so they can find who to ask for information

and then my notes end...

Labels: km, SLA2008

¶ 3:58 PM| (0) comments |cites (technorati) |

Saturday, June 28, 2008

Sources of more information for novices at SNA - or citation analysis

Before SLA I started a mini series of posts with some (what I hope is) practical how-to information for librarians and information professionals who might want to use social network analysis techniques to do citation analysis or bibliometric analysis for their library customers/patrons/users/whatever.

So this post is the last in that series - this is how to learn more.

Here's a book on scientometrics. May be a bit theoretical, but also some helpful advice:
Leydesdorff, L. A. (1995). The challenge of scientometrics: the development, measurement, and self-organization of scientific communications. Leiden: DSWO Press, Leiden University. (check worldcat for a local copy)

For help with SNA and the meaning of the various centrality measures, there's no better book than:
Wasserman, S., & Faust, K. (1997). Social network analysis: methods and applications. New York: Cambridge University Press. (check worldcat for a local copy)

For help with the software - well, that's sort of trial and error to be perfectly honest. Others can get Pajek to do amazing things, but I still find it quite complex. This online text does help quite a lot with UciNet:
Hanneman, Robert A. and Mark Riddle. 2005. Introduction to social network methods. Riverside, CA: University of California, Riverside ( published in digital form at http://faculty.ucr.edu/~hanneman/ )

This book helps with Pajek, but the program has updated since the book was written so be sure to go to the book's companion web site to learn about the changes (argh - I was trying to extract fragments using the book and didn't realize that it's done completely differently in the current version of the program!)
Nooy, W. d., Mrvar, A., & Batagelj, V. (2005). Exploratory social network analysis with Pajek. New York: Cambridge University Press.

There are lots of other software products and routines. Dr. Leydesdorff has made routines he's written available on his web site (watch your speakers- music plays when you hit his home page).

There are lots of ARIST, JASIST, and other articles on citation analysis, how citations work, and the value of various measures...
One more recent that I found interesting is:
Leydesdorff, L. (2008). Caveats for the use of citation indicators in research and journal evaluations. Journal of the American Society for Information Science and Technology, 59(2), 278-287. doi:10.1002/asi.20743

As for people - seems like many of them hang out on the SIGMETRICS list - but ... for them to help you (and they are very nice people who are very helpful), you need to have done the reading first and to clearly explain the trouble. You might try browsing the archives, too. I tried asking a question on the UCInet listserv - but got 0 (zero) responses - so good luck with that!

Classes:
The class I took on SNA at Maryland in the Sociology department is only offered every once in a while and not really on a set schedule. I don't know that other iSchools offer regular classes on citation analysis, either, and I'm not sure about independent studies. If anyone in the US offered this, I'd expect it to be Drexel, but I didn't see it in their catalog. Look in your institution in the business school, in criminology, in sociology, and in computer science - SNA classes could really be lurking in any of these places.

There are lots of classes held by the UCInet people and some consultants at various conferences like Sunbelt. (poke around here: http://www.insna.org/, once their site is back up - I always leave out a letter and end up on a support organization website)

If you've gotten this far and have a comment or question, feel free to ask and I'll help if I can.

¶ 1:48 PM| (0) comments |cites (technorati) |

Swanson's Postulates of Impotence

(oh - this is going to get me *such* search engine traffic I don't want!)

I do so love the rantings of the cranky old men and women of information science. I hope to feature some of these on my blog as I continue to compile my comprehensive exam proposal as well as actually re-reading for my comprehensive exams.

I had forgotten about this article assigned in the Information Structure class taught by Rebecca Green. But it's a good one.

Swanson, D. R. (1988). Historical note: information retrieval and the future of an illusion. Journal of the American Society for Information Science, 39(2), 94-98.

Swanson is one of those big names in IR. He basically goes over a little of the history of IR and then puts forth, as suggested by Fairthorne, nine postulates of impotence - or things that cannot be done in IR- or at least in subject-oriented IR (as opposed to known item, for example). He suggests that these might be a useful in developing new research directions and he hopes to start some arguments.

"an information need cannot be fully expressed as a search request that is independent of innumerable presuppositions of context -- context that itself is impossible to describe fully, for it includes among other things the requester's own background of knowledge"
can't write rules to precisely translate a request into a set of search terms
"a document cannot be considered relevant to an information need independently of all other documents that the requester may take into account"
can never get 100% recall (or be completely sure of the % recall you did get)
"machines cannot recognize meaning and so cannot duplicate what human judgment in principle can bring to the process of indexing and classifying documents. Corollary: Some indexers all of the time, and all indexers some of the time, also cannot duplicate what human judgment in principle can bring to the process of indexing."
"word-occurrence statistics can neither represent meaning nor substitute for it"
the process is iterative, so can't evaluate an ir system based only on a single iteration [more important now than ever, perhaps]
"you can have subtle relevance judgments or highly effective mechanized procedures, but not both"
"consistently effective fully automatic indexing and retrieval is not possible"

His point: humans are subtle, complex, and relevance judgments "entail... artful leaps of the imagination unconstrained by logic, reasoning, or the clammy hand of consistency..." But he does not deny that machines are incredibly important to IR - just that they cannot take us the whole way.

Wow, he studied the work of intelligence analysts in 1955.... and their polished analyses coming from large quantities of fragmented information.

He's not all negative - he talks about some of the things that can be done, too. But the entertaining bits are the couple of times when he mentions that ideas had been thought up in the 50s or earlier and then reinvented in the 70s and 80s. Of course, we're still reinventing these ideas now - some people think that just because there's a computer involved that information and how people deal with information is completely new. There are definite changes, but some things proposed in the 50s are now really possible.

Labels: information retrieval

¶ 12:50 PM| (1) comments |cites (technorati) |

Friday, June 27, 2008

SLA2008: the PAM Blog

Be sure to check out the Physics-Astronomy-Math Division blog for more notes and updates on SLA2008.

I still have notes on paper for a couple of sessions that I will try to transcribe...

Labels: SLA2008

¶ 3:36 PM| (0) comments |cites (technorati) |

Tuesday, June 17, 2008

SLA2008: CS Roundtable

Ebooks as textbooks
- Knovel and CRC books - can be used... one librarian who's been using these has had complaints from students that they can't find anything within the ebooks - need to print the whole thing.
- tried that for a chemistry class and they did not like it - they wanted print - maybe needs to be introduced lower
- maybe ebrary - so you can highlight and make notes?
- maybe if a book to read linearly instead of jumping back and forth
- synthesis lectures as course assignments? (and Springer) allow use in coursepacks - multiple users are allowed whereas systems that only allow a few users don't work for courseworks.

Federated search for ebooks?
OSU is within Ohiolink where they load things locally. -- user limit is a big deal with federated searching.

Lecture Notes in CS - online - how to publicize?
- engineering toolbar within firefox (Ohio University - see his poster session) to get to these.
- I suspect links from google aren't going to LNCS - from my experience
- SpringerLink's horrid search

my question - mushing together books and articles
- not a problem for some - categorizes things
- separate category or quick set
- worldcat local does this and it's a mess (but maybe non-libraries are ok with that)

ACM search engine improvements
- no reps, again!
- have a library advisory board - but haven't called on them for 2 years
- communications of the acm - relaunching - mostly staff steered in the past, but trying to get more to perspectives by experts on research instead of unsolicited manuscripts. less papers on MIS.
- for the author profiles, authors really need to add information to their profiles to make it more useful
- tell them we need to get links out from the acm guide to computing literature (open linking for link resolvers)

Errors in databases
- good experience w/IEEE, Inspec, WoS, ScienceDirect
- bad experience w/acm
- need proper contact for editorial feedback

- problems with Inspec on Ovid and resolver not working right
- international journal of high performance computing - moved to Sage - Sage photoshopped the cover image!!!!!

Parker's study of where to put CS

Labels: SLA2008

¶ 2:02 PM| (3) comments |cites (technorati) |

SLA2008: Cyberinfrastructure Informatics Across the Biological Sciences

Biodiversity Informatics: Evolving in the Biological Sciences
Biodiversity Heritage Library
Cathy Noorden, Woods Hole/Marine Biological Library

tuatara - outside has stayed the same but inside has changed completely, most quickly evolving animal... so a good metaphor for librarians.

Part of the Encyclopedia of Life project. Group of 10 museums that are scanning the taxonomic literature to form the base of the EOL.

Importance of all materials. Not just one type of discovery tool, not only libraries, equal group of contributors.

Goal: core information open access on the web. Can do this because early information - first time something described - still very important to taxonomists.

Partners: internet archive, publishers as content providers, copyright holders.

Domain: 5.4M books dating back to 1469, 50% pre-1923 so out of copyright. But also non-profit society journals who can't afford to digitize agree and also get copies for their page, and some agreements with commercial publishers.

How do they differ from google? Taxonomic information - including what has changed, different languages, historic information. Other organizations need this too - PubMed doesn't have all of these names. Example: this particular salamander, many different spellings so you would not get all of the articles. Reconciliation, link to alternative names for the same organism.

uBio - Universal Biological Index and Organizer. "Taxonomic intelligence is the inclusion of taxonomic practices skills and knowledge within informatics services to manage information about organisms" Giant index with 10.8 million names and merging maps - including common names. Linkage to other data types (molecular, morphological, phenotype). Find it scientific name recognition algorithm (after scanning and OCR-ing) - fairly easy starts upper case then lower case and in italics. Training and improving algorithm.

uBioRSS taxonomically intelligent RSS feed aggregator - their lab works on 200 organisms, so they go out and search selected sources on those organisms. Also present search results for their authors' articles right on the home page.

See also poster session tonight for more details.

Q: zoo record? A: yes, Q2: images? A: yes, and others, too, but rights are slightly more difficult.

Ecological Informatics: Building Solutions for Multi-Decadal Research.
Dr. Bill Michener, University of New Mexico
Long Term Ecological Research (LTER), http://www.lternet.edu/

LTER is required for slow or transient processes, episodic or infrequent events, trends, multi-factor, delayed effects.

Issues: data dispersion, field stations, museums, local agencies, individual collections - data entropy - basically, scientists lose or forget details about their datasets after publication... Data integration- different syntax (format), schema (model), semantics (meaning).

ecological informatics- status: some data archives, some tools (such as morpho, meta cat, kepler - a workflow system). Vegbank- value added database, societal and NSF effort.

future: Smith, Knapp, Collins in press - the global change hockey stick applies to population, temp, environmental nitrogen...

coupled science and cyberinfrastructure, facilitate access and use (usability of preserved data). Global communities of practice. Diverse scales and scopes. Enabling the scientists, whole life cycle data management, domain agnostic solutions....

citizen science toolkit - citizenscience.org

Empty Suits
Quentin Wheeler
(ppt issue so no slides)
Creating the taxonomic content for taxonomic knowledge - lack of funding for taxonomic work based on mistaken idea that we're done or can rely on history and historical data.

We only know 10% species on earth and we might lose 25-50% of the species within this century. How can we detect invasive species or find bioterrorism if we don't have a biodiversity baseline. Major discoveries every year in biodiversity (biggest stick bug found).

Still major, heated, furious debate on what a species is - still more than 2 dozen competing definitions. Tradition of argumentation... biological challenge and process challenges - fundamental problems. Disservice to turn taxonomy into a service for other areas of science instead of as for a basic science. More money into mobilizing bad data (databases) instead of generating good data and capturing new good data as it's created.

NSF - Planetary biodiversity inventories - required communities of taxonomists work together (new idea, generally taxonomists are very independent). Example: cat fishes - 200 ichthyologists working together - now can work together. Also information access in smaller or remote institutions. Remote microscopes.

Challenges - science is not the issue (phylogenics has had a huge impact, and they have rigorous testable hypotheses)... it's more the culture of not collaborating, and the need for support through cyberinfrastructure.

Next steps: more on bugs. more on history and philosophy of science (to look at how marginalized to prevent future), sociologists to help work with rewards structure, change image of taxonomy, advertising...

saw video: Planet Bob.

Labels: biodiversity informatics, SLA2008

¶ 11:52 AM| (0) comments |cites (technorati) |

Sunday, June 15, 2008

SLA2008: Charlie Rose interviews Vinton Cerf

At the opening general session. Vint Cerf is the Chief Internet Evangelist for Google

Charlie Rose starts by thanking us and saying he didn' t realize we had an association (sigh!)

... start at the beginning...

ARPA was created... late 60s... arpanet packet switching... C2 interest and mobile comms

... Gore did as a Senator get funding for NSF net... then as vice president get commercial traffic on network...

...what is the internet today...
like road system or power distribution system - wherever and as much as you need. By 2010 want to get 3billion on net ... mobile penetration is much quicker and conducive bcs no wires. Wants everyone on the planet to have access to the internet in 10 years... but the new technology doesn't supplant the old - in addition. Need broadband, too.

... what is the essence of the internet...
academic world the coin of the realm is knowledge so the design of the internet is completely open and this was during the Cold War. Over the past 30 years the openness has enabled innovation -- don't have to get permission... openness and freedom is the essence

... what is the significance of users being content providers - where are we going with social networks ...
10 hrs of video going into YouTube per minute... lots of content... when we are all sharing... the world's knowledge at your fingertips.

...digitizing of libraries...
googles have been working - at least at digitizing the imagery of books, and OCRing but we are very far from ... but things are being produced digitally now... have to stop thinking of digital objects as analogs of print. Sometimes these complex digital artifacts can't really be represented in print - increasing complexity of digital objects. deeply concerned about digital preservation - if the software is no longer available - sitting atop a pile of mouldering bits

... will google do that...
well google will do some, but other companies that own content might find that if they can no longer support the software that they can make content available on the web for others to find a way to access

... does the new president need to appoint a commission...
not the solution.. needs to be more distributed and there might be more than one solution

science data sharing - example genebank where the journals required submission to archive prior to publication of articles, was a successful example - works that way for astro and atmospheric --

"some people say that information is power - baloney - information sharing is power"

delay destruction tolerant protocols - test them on deep impact, then on ISS then open them to the world and share them to all countries sending stuff to space so then we can repurpose platforms and we can use the store and forward... like Phoenix talking to the Mars orbiters... and build a space network.

... what rules do we need internationally...
we're working with coordinating group... but there will be abuses so how will we work against that? various technological means, and then international agreement (which is complicated)...

but information security is a big deal...

...why can't we end spam...
plug the holes of the vulnerabilites in the browsers, better filtering

solar power to internet cafes and then use satellite internet - help people with the capital cost who want to set up internet cafes

2013-2014 to 75% internet penetration...

censorship?
we even censor here, but there is no country that doesn't have some access -- even Burma. In the long run necessary for development. In the long run democratizing agent - but potential hazzard bcs [content pay walls prevent access and limit democratizing effect]

...how will search change....
if we are lucky go past statistical matching of strings and get to more semantics

...voice recognition...
BLUE scoring system for VR... google broke the .5 barrier so that an expert can't fix the translation without referring to the original.
"translate out of sight out, of mind as invisible idiot"

... will US continue to lead the way...
let me remove the myth that the internet was designed completely in the U.S.

(missed a bit)
... what's the next big idea
if I knew I'd being doing it... mobile is interesting - screen size of a 1928 tv and keyboard suitable for a 3ft tall person... make it into a universal remote control.

Labels: SLA2008

¶ 8:44 PM| (0) comments |cites (technorati) |

Friday, June 13, 2008

More big moves in the business of the info world...

I'm a DIALOG fan - yes, this might surprise some. I was just searching petroleum abstracts (Tulsa) using Dialog Link this afternoon. See that's a database we'd never subscribe to, but I was logged on and searching away in a few seconds....

Dialog is being bought by ProQuest. I got a press release in my e-mail today from Beth Dempsey (it actually went to spam... but I caught it). I'm not sure about this... on the one hand, there is that remote chance that the CSA databases I love will be more accessible on DIALOG (right - don't forget that CSA is now ProQuest)... Hey, maybe Dialog will work even better with RefWorks...

Strange - Thomson bought ISI, Dialog, Gale.... and now only ISI remains...

ProQuest has a commitment to professional searchers (why is their online access built like it is? I mean, it's not bad, but not really great either)...

Don't take away my Dialog - just add new databases to it, and we'll all be happy :)

¶ 7:00 PM| (1) comments |cites (technorati) |

Tuesday, June 10, 2008

EPA National Dialogue on Access to Environmental Information.

via SLA-dGI

"EPA has established a blog for one week, 9 June - 13 June, as a vehicle for receiving public comments on improving access to EPA information. The EPA Partner Blog [http://blog.epa.gov/partners/] is part of the EPA National Dialogue on Access to Environmental Information. The blog is set up to receive responses on five topics:

Understanding Information: Putting environmental information into context for our customers.
Finding Information: Making environmental information easier to find or access.
What Works: What is working for your organization?
Building to Share: How do we leverage our collective strengths and capabilities?
Going Beyond the Web: Reaching people who don’t have Internet access.

EPA says: "The information gathered on this site, combined with input gathered through the National Dialogue with EPA’s external partners and the public, will be used to develop a comprehensive multi-year strategy on access to environmental information.""

¶ 1:05 PM| (0) comments |cites (technorati) |

Sunday, June 08, 2008

Advice for the rank beginner part 4

So when last we spoke, you had a picture in NetDraw. Maybe a mess, maybe you can already see something interesting. What do I mean by interesting? Well, is it a densely connected ball or are there lots of little groups sitting around. Hm.

Components. These are groups of nodes connected to each other but not to the rest of the graph. If you were looking at co-authorship within an organization, and you didn't have any collaboration across departments, each of these might represent a department. They might represent research topics or areas in either co-author or co-citation networks. Find these in NetDraw using Analysis > components > and then mark them by color or by shape. My co-authorship network looks festive:

If you want to see the components one at a time look over on the right hand side of netdraw... see a dropdown box with ID, change that to components, then see how you have the list of components? then below that a > 0 ! then below that letters a i s c ^D R. The s will step through the components. You can then probably even turn the labels back on when you get reasonable clusters. And you can use the spring embedding layout to re-arrange things again.

Ok, now lets look for nodes who are really well connected. On netdraw you can hit one button and calculate a bunch of different centrality measures. Analysis > centrality measures... of these, Degree is the most straightforward. A node's degree centrality is the number of connections it has. You might have some components that are complete - all of the connections that could exist do. So all of the nodes in that will have the same degree centrality. With me? Right. So then, well, this might just be the co-authors on the same paper. Well, Sitkis gave you line weights, too, so now you can see if there are two or more author1 - author2 links. You can make the lines heavier or put numbers next to them. Or you might have a star type thing. Or you might just have a mess with a few of high centrality.

I know a little bit (but not that much) more than I'm saying, but all I promised was advice for the rank beginner.

If you want to look at co-authors only, you can export your RefWorks database into a tab delimited file. Import into excel. Delete off everything but the authors. You'll need quotes around each author name since there's a space. You can use the concatenate command so you end up with:
"author fi mi" "author2"...
spaces between NOT tabs. Copy into your handy text editor so no weirdness. So then you can use the same header as you got from sitkis, but you need to know how many nodes. If you took your whole refworks database, you can add this up this pretty quickly using lookup by author.... Maybe there's a better way. In any case this would not be good in high energy physics - for obvious reasons.

Coming up: the conclusion... where to go for more info.

¶ 8:35 PM| (0) comments |cites (technorati) |

Saturday, June 07, 2008

Advice for the rank beginner part 3

Ok, so after last post hopefully you have some ideas of which network you are interested in. Co-citation is easy to get from sitkis, but co-authorship might be easiest to understand.

Use citations > co-citation network to export the file (see 2.2 in the pdf user's guide that comes in the zip file you download with the sitkis software). The exported file is in the .dl format. This can be used in UCInet or NetDraw from Analytic Technologies as well as in Pajek and other SNA programs. There are several ways to do dl files but what you get from sitkis is:
dl n=546 format=edgelist1 labels embedded type=symmetric
data:
"nodelabel1" "nodelabel2" weight

dl signals that it's a dl file, there are 546 nodes, edgelist with labels embedded means that you will define the network by listing the edges (symmetric, not directional, connections between actors or nodes), and that the labels for the nodes are used instead of say numbering all of the nodes and then defining the connection by giving the numbers and listing the labels separately.

So, let's just take a look at what we've got by using NetDraw. In fact, you could probably use only NetDraw and do much of what you want to do.

Open netdraw
click on the folder to open a file
choose dl file format
browse and pick your file
this is a one-mode network and we'll ignore reflexive ties

you'll see a big jumble at first:

So then you can clean it up a bit:

make the font size and symbol size smaller. use the buttons on the toolbar A A and S S (or use the properties > symbol ... )
use a spring embedding algorithm for layout - this puts connected things closer together and pushes apart things that aren't connected. file > layout > graph theoretic > spring embedding (defaults) > ok
turn off arrow heads (as they are meaningless for your undirected or symmetric graph) button that looks like an arrow
drop isolates (these are guys who aren't connected to anyone) use the ~~Iso~~ button
click on individual nodes to move them slightly to show labels

Yep, still a mess. Let's turn off the labels so that we can see the structure a bit better. Use the L button. Really just a jumble in the middle.

Oh, yeah - if you want to get a picture from netdraw - use file > save diagram as. If you are writing a paper using Word, then pick metafile for the clearest and most scalable picture. Otherwise probably jpg will work. BTW - this is a search from 1996 - 2008 on WoS for blog* OR weblog*, refined to "articles"

If you want to stop and then come back, you can save a VNA file in NetDraw. (file > save data as > vna)

Co-authorship.
Export a 2-mode network from Sitkis (authors-articles), then use UCInet to get a one-mode with author-author ties.
This is a little hosed - but make sure you save the file from sitkis with a .dl extension. Then you have to "import" data into ucinet to get it into two files .##d and .##h. Files will start proliferating so you might make a note of what these all are. Then you can use the data > affiliations (2 mode to 1 mode) using rows -- this will give you the two files that are the author-author network. When you import into NetDraw, you'll pick ucinet format (##d, ##h) and 1 mode.

Still to come

some simple measures and looking at your graph
using a citation manager to get co-authorship data
where to go for more information (and people who know more about this than I ever will)

¶ 11:54 AM| (0) comments |cites (technorati) |

Thursday, June 05, 2008

Advice for the rank beginner part 2

Ok, so I had you go get the data before really talking about what you might do with it. Let's look at what we have.

article <has author> author1, author2,... authorn
authorn <has address> institution, city, state, country, zip code
article <cites> article1, article2,... articlen
... other citation stuff like journals, dates...

we can build a co-authorship network.
article1 <has author> author1
article1 <has author> author2
therefore there is a connection between author1 --- author2, some similarity or communication between the two.

you can extend this
article1 <has author> author1 and author1 <has address> institution1, country1
article1 <has author> author2 and author2 <has address> institution2, country2
Therefore there is a connection institution1 -- institution2 or country1 -- country2

Co-citation is
article1 <cites> article2
article3 <cites> article2
therefore there might be some similarity between articles 1 and 3.

You can also call both strings of citations vectors and then measure their similarity using some sort of Euclidean measure or other, but I haven't actually done that so I'll leave it for someone else to explain.

Still to come (and not tonight!)

exporting from Sitkis
importing into UCInet or NetDraw (comes with UCInet)
drawing pretty pictures
some simple measures
using a citation manager to get co-authorship data
where to go for more information (and people who know more about this than I ever will)

¶ 11:28 PM| (0) comments |cites (technorati) |

Some advice for the rank beginner in citation analysis part 1

I keep realizing how much I have to learn in citation analysis and social network analysis. So I think that I have nothing to offer; yet, I do know a lot, and I've learned some lessons the hard way. I'll try to give back a little here because it will probably be good for me and it might help someone else.

This post is for librarians and other information professionals who might want to dabble in or dip a toe into citation analysis and who are a bit lost with all of the massive amounts of advice and help out there.

First, I'm talking about studying the structure that is created through the linking of people or works by co-authorship or citation (in some fashion). It's a fairly straight forward thing to calculate someone's or some organization's h index (using two competing tools and some free things), and it's fairly straight forward to tally up citations. Although, it is nigh impossible to be comprehensive in just about any field. What is more complicated is building a network of relationships between authors and using this network to understand the collaborations and potential information flows. So that's what I'm talking about. You might use this within an organization to see the patterns of how people in one department write papers with people in another. You might use this to look at some sort of similarity based on who all cites the same paper. This sort of analysis is a value added service that librarians and information professionals can provide for their organizations.

The data
Where do you get the data? Well, Web of Science (henceforth WoS) from ~~ISI~~, ~~Thompson~~, ~~Thompson Scientific~~ Thompson Reuters is still the best choice. You will need a site license to this or the CD-ROMs because this is way expensive for DIALOG. Yes, there are competitors, but many of the tools are built to work with WoS data. If you think you're going to find cleaner data, hah! Let me know how that works for you. Ok, but, it is still a bit dirty and it has those pesky known weaknesses: western bias, journal articles only (so CS and some areas of engineering under represented), not intended to be comprehensive. Scopus, in my experience, has crap for data (they messed up and at one point marked a bunch of stuff from the 1980s as from 2008, they think MPOW is in New Jersey (I mean really - have you not heard of our parent institution???))... As for Google Scholar, there are some tools that use it, but we don't know how comprehensive (or what it covers), how far back it goes, how frequently it's updated...

Now, for co-authorship, you can really use just about anything or some combination, but we'll talk about that later.

What Software Do I Need?

Sitkis - http://users.tkk.fi/~hschildt/sitkis/index.html
A simple text editor if you need to do more than 500 references
Microsoft Access
UCInet (free trial is fine for now) - http://www.analytictech.com/downloaduc6.htm
Optionally, RefWorks or EndNote or equivalent if you want to do co-authorship using data from multiple databases

Preparing the Data
Do your search in WoS. Mark the records and save the full record including citations as plain text. I believe it will only do 500 at a time, so you'll have to paste these files together (take off the preliminaries like FN and the EF (end of file) for the middle records. You can do it in Sitkis, or you can concatenate at the command line, whatever works. Here's an example of a record (note they do NOT have my zip code and city right, and there's an extra letter in my e-mail, sigh):

FN ISI Export Format
VR 1.0
PT J
AU Pikas, CK
TI Blog searching for competitive intelligence, brand image, and
reputation management
SO ONLINE
LA English
DT Article
C1 Johns Hopkins Univ, Appl Phys Lab, Baltimore, MD 21218 USA.
RP Pikas, CK, Johns Hopkins Univ, Appl Phys Lab, Baltimore, MD 21218 USA.
EM cchristina.pikas@jhuapl.edu
NR 0
TC 5
PU ONLINE INC
PI WILTON
PA 213 DANBURY RD, WILTON, CT 06897-4007 USA
SN 0146-5422
J9 ONLINE
JI Online
PD JUL-AUG
PY 2005
VL 29
IS 4
BP 16
EP 21
PG 6
SC Computer Science, Information Systems; Information Science & Library
Science
GA 936SE
UT ISI:000229874900006
ER

EF

When you install Sitkis you'll get a manual and a user guide. The manual helps you with setting up and importing data. I followed the manual closely and didn't have any problems.

Coming up:

Types of analysis
exporting from Sitkis
importing into UCInet or NetDraw (comes with UCInet)
drawing pretty pictures
some simple measures
using a citation manager to get co-authorship data
where to go for more information (and people who know more about this than I ever will)

¶ 10:15 PM| (0) comments |cites (technorati) |